For the problem of time, effort and money consuming to obtain a large number of samples by conventional means faced by Artificial Intelligence (AI) application research in different fields, a variety of sample augmentation methods have been proposed in many AI research fields. Firstly, the research background and significance of data augmentation were introduced. Then, the methods of data augmentation in several common fields (including natural image recognition, character recognition and discourse parsing) were summarized, and on this basis, a detailed overview of sample acquisition or augmentation methods in the field of medical image assisted diagnosis was provided, including X-ray, Computed Tomography (CT), Magnetic Resonance Imaging (MRI) images. Finally, the key issues of data augmentation methods in AI application fields were summarized and the future development trends were prospected. It can be concluded that obtaining a sufficient number of broadly representative training samples is the key to the research and development of all AI fields. Both the common fields and the professional fields have conducted sample augmentation, and different fields or even different research directions in the same field have different sample acquisition or augmentation methods. In addition, sample augmentation is not simply to increase the number of samples, but to reproduce the existence of real samples that cannot be completely covered by small sample size as far as possible, so as to improve sample diversity and enhance AI system performance.
Concerning the problem that the efficiency of serial PageRank algorithm is low in dealing with mass Web data, a PageRank parallel algorithm based on Web link classification was proposed. Firstly, the Web was classified according to its Web link, and the weights of different Web which was from diverse websites were set variously. Secondly, with the Hadoop parallel computation platform and MapReduce which has the characteristics of dividing and conquering, the Webpage ranks were computed parallel. At last, a data compression method of three layers including data layer, pretreatment layer and computation layer was adopted to optimize the parallel algorithm. The experimental results show that, compared with the serial PageRank algorithm, the accuracy of the proposed algorithm is improved by 12% and the efficiency is improved by 33% in the best case.
To improve the accuracy of bird sounds recognition in low Signal-to-Noise Ratio (SNR) environment, a new bird sounds recognition technology based on Radon Transform (RT) and Translation Invariant Discrete Wavelet Transform (TIDWT) from spectrogram after the noise reduction was proposed. First, an improved multi-band spectral subtraction method was presented to reduce the background noise. Second, short-time energy was used to detect silence of clean bird sound, and the silence was removed. Then, the bird sound was translated into spectrogram, RT and TIDWT were used to extract features. Finally, classification was achieved by Support Vector Machine (SVM) classifier. The experimental results show that the method can achieve better recognition effect even the SNR belows 10dB.